138 research outputs found

    Sparsest factor analysis for clustering variables: a matrix decomposition approach

    Get PDF
    We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA

    CODA: Accurate Detection of Functional Associations between Proteins in Eukaryotic Genomes Using Domain Fusion

    Get PDF
    Background: In order to understand how biological systems function it is necessary to determine the interactions and associations between proteins. Gene fusion prediction is one approach to detection of such functional relationships. Its use is however known to be problematic in higher eukaryotic genomes due to the presence of large homologous domain families. Here we introduce CODA (Co-Occurrence of Domains Analysis), a method to predict functional associations based on the gene fusion idiom.Methodology/Principal Findings: We apply a novel scoring scheme which takes account of the genome-specific size of homologous domain families involved in fusion to improve accuracy in predicting functional associations. We show that CODA is able to accurately predict functional similarities in human with comparison to state-of-the-art methods and show that different methods can be complementary. CODA is used to produce evidence that a currently uncharacterised human protein may be involved in pathways related to depression and that another is involved in DNA replication.Conclusions/Significance: The relative performance of different gene fusion methodologies has not previously been explored. We find that they are largely complementary, with different methods being more or less appropriate in different genomes. Our method is the only one currently available for download and can be run on an arbitrary dataset by the user. The CODA software and datasets are freely available from ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/v6.1.0/CODA/. Predictions are also available via web services from http://funcnet.eu/

    Can sacrificial feeding areas protect aquatic plants from herbivore grazing? Using behavioural ecology to inform wildlife management

    Get PDF
    Effective wildlife management is needed for conservation, economic and human well-being objectives. However, traditional population control methods are frequently ineffective, unpopular with stakeholders, may affect non-target species, and can be both expensive and impractical to implement. New methods which address these issues and offer effective wildlife management are required. We used an individual-based model to predict the efficacy of a sacrificial feeding area in preventing grazing damage by mute swans (Cygnus olor) to adjacent river vegetation of high conservation and economic value. The accuracy of model predictions was assessed by a comparison with observed field data, whilst prediction robustness was evaluated using a sensitivity analysis. We used repeated simulations to evaluate how the efficacy of the sacrificial feeding area was regulated by (i) food quantity, (ii) food quality, and (iii) the functional response of the forager. Our model gave accurate predictions of aquatic plant biomass, carrying capacity, swan mortality, swan foraging effort, and river use. Our model predicted that increased sacrificial feeding area food quantity and quality would prevent the depletion of aquatic plant biomass by swans. When the functional response for vegetation in the sacrificial feeding area was increased, the food quantity and quality in the sacrificial feeding area required to protect adjacent aquatic plants were reduced. Our study demonstrates how the insights of behavioural ecology can be used to inform wildlife management. The principles that underpin our model predictions are likely to be valid across a range of different resource-consumer interactions, emphasising the generality of our approach to the evaluation of strategies for resolving wildlife management problems

    Distinguishing Asthma Phenotypes Using Machine Learning Approaches.

    Get PDF
    Asthma is not a single disease, but an umbrella term for a number of distinct diseases, each of which are caused by a distinct underlying pathophysiological mechanism. These discrete disease entities are often labelled as asthma endotypes. The discovery of different asthma subtypes has moved from subjective approaches in which putative phenotypes are assigned by experts to data-driven ones which incorporate machine learning. This review focuses on the methodological developments of one such machine learning technique-latent class analysis-and how it has contributed to distinguishing asthma and wheezing subtypes in childhood. It also gives a clinical perspective, presenting the findings of studies from the past 5 years that used this approach. The identification of true asthma endotypes may be a crucial step towards understanding their distinct pathophysiological mechanisms, which could ultimately lead to more precise prevention strategies, identification of novel therapeutic targets and the development of effective personalized therapies

    Genomic analysis of the function of the transcription factor gata3 during development of the Mammalian inner ear

    Get PDF
    We have studied the function of the zinc finger transcription factor gata3 in auditory system development by analysing temporal profiles of gene expression during differentiation of conditionally immortal cell lines derived to model specific auditory cell types and developmental stages. We tested and applied a novel probabilistic method called the gamma Model for Oligonucleotide Signals to analyse hybridization signals from Affymetrix oligonucleotide arrays. Expression levels estimated by this method correlated closely (p<0.0001) across a 10-fold range with those measured by quantitative RT-PCR for a sample of 61 different genes. In an unbiased list of 26 genes whose temporal profiles clustered most closely with that of gata3 in all cell lines, 10 were linked to Insulin-like Growth Factor signalling, including the serine/threonine kinase Akt/PKB. Knock-down of gata3 in vitro was associated with a decrease in expression of genes linked to IGF-signalling, including IGF1, IGF2 and several IGF-binding proteins. It also led to a small decrease in protein levels of the serine-threonine kinase Akt2/PKB beta, a dramatic increase in Akt1/PKB alpha protein and relocation of Akt1/PKB alpha from the nucleus to the cytoplasm. The cyclin-dependent kinase inhibitor p27(kip1), a known target of PKB/Akt, simultaneously decreased. In heterozygous gata3 null mice the expression of gata3 correlated with high levels of activated Akt/PKB. This functional relationship could explain the diverse function of gata3 during development, the hearing loss associated with gata3 heterozygous null mice and the broader symptoms of human patients with Hearing-Deafness-Renal anomaly syndrome

    CDK targets Sae2 to control DNA-end resection and homologous recombination

    Get PDF
    DNA double-strand breaks (DSBs) are repaired by two principal mechanisms: non-homologous end-joining (NHEJ) and homologous recombination (HR)1. HR is the most accurate DSB repair mechanism but is generally restricted to the S and G2 phases of the cell cycle, when DNA has been replicated and a sister chromatid is available as a repair template2-5. By contrast, NHEJ operates throughout the cell cycle but assumes most importance in G1 (refs 4​, ​6). The choice between repair pathways is governed by cyclin-dependent protein kinases (CDKs)2,3,5,7, with a major site of control being at the level of DSB resection, an event that is necessary for HR but not NHEJ, and which takes place most effectively in S and G2 (refs 2​, ​5). Here we establish that cell-cycle control of DSB resection in Saccharomyces cerevisiae results from the phosphorylation by CDK of an evolutionarily conserved motif in the Sae2 protein. We show that mutating Ser 267 of Sae2 to a non-phosphorylatable residue causes phenotypes comparable to those of a sae2Δ null mutant, including hypersensitivity to camptothecin, defective sporulation, reduced hairpin-induced recombination, severely impaired DNA-end processing and faulty assembly and disassembly of HR factors. Furthermore, a Sae2 mutation that mimics constitutive Ser 267 phosphorylation complements these phenotypes and overcomes the necessity of CDK activity for DSB resection. The Sae2 mutations also cause cell-cycle-stage specific hypersensitivity to DNA damage and affect the balance between HR and NHEJ. These findings therefore provide a mechanistic basis for cell-cycle control of DSB repair and highlight the importance of regulating DSB resection

    Competition between Replicative and Translesion Polymerases during Homologous Recombination Repair in Drosophila

    Get PDF
    In metazoans, the mechanism by which DNA is synthesized during homologous recombination repair of double-strand breaks is poorly understood. Specifically, the identities of the polymerase(s) that carry out repair synthesis and how they are recruited to repair sites are unclear. Here, we have investigated the roles of several different polymerases during homologous recombination repair in Drosophila melanogaster. Using a gap repair assay, we found that homologous recombination is impaired in Drosophila lacking DNA polymerase zeta and, to a lesser extent, polymerase eta. In addition, the Pol32 protein, part of the polymerase delta complex, is needed for repair requiring extensive synthesis. Loss of Rev1, which interacts with multiple translesion polymerases, results in increased synthesis during gap repair. Together, our findings support a model in which translesion polymerases and the polymerase delta complex compete during homologous recombination repair. In addition, they establish Rev1 as a crucial factor that regulates the extent of repair synthesis

    The distribution of inverted repeat sequences in the Saccharomyces cerevisiae genome

    Get PDF
    Although a variety of possible functions have been proposed for inverted repeat sequences (IRs), it is not known which of them might occur in vivo. We investigate this question by assessing the distributions and properties of IRs in the Saccharomyces cerevisiae (SC) genome. Using the IRFinder algorithm we detect 100,514 IRs having copy length greater than 6 bp and spacer length less than 77 bp. To assess statistical significance we also determine the IR distributions in two types of randomization of the S. cerevisiae genome. We find that the S. cerevisiae genome is significantly enriched in IRs relative to random. The S. cerevisiae IRs are significantly longer and contain fewer imperfections than those from the randomized genomes, suggesting that processes to lengthen and/or correct errors in IRs may be operative in vivo. The S. cerevisiae IRs are highly clustered in intergenic regions, while their occurrence in coding sequences is consistent with random. Clustering is stronger in the 3′ flanks of genes than in their 5′ flanks. However, the S. cerevisiae genome is not enriched in those IRs that would extrude cruciforms, suggesting that this is not a common event. Various explanations for these results are considered

    Are We Predicting the Actual or Apparent Distribution of Temperate Marine Fishes?

    Get PDF
    Planning for resilience is the focus of many marine conservation programs and initiatives. These efforts aim to inform conservation strategies for marine regions to ensure they have inbuilt capacity to retain biological diversity and ecological function in the face of global environmental change – particularly changes in climate and resource exploitation. In the absence of direct biological and ecological information for many marine species, scientists are increasingly using spatially-explicit, predictive-modeling approaches. Through the improved access to multibeam sonar and underwater video technology these models provide spatial predictions of the most suitable regions for an organism at resolutions previously not possible. However, sensible-looking, well-performing models can provide very different predictions of distribution depending on which occurrence dataset is used. To examine this, we construct species distribution models for nine temperate marine sedentary fishes for a 25.7 km2 study region off the coast of southeastern Australia. We use generalized linear model (GLM), generalized additive model (GAM) and maximum entropy (MAXENT) to build models based on co-located occurrence datasets derived from two underwater video methods (i.e. baited and towed video) and fine-scale multibeam sonar based seafloor habitat variables. Overall, this study found that the choice of modeling approach did not considerably influence the prediction of distributions based on the same occurrence dataset. However, greater dissimilarity between model predictions was observed across the nine fish taxa when the two occurrence datasets were compared (relative to models based on the same dataset). Based on these results it is difficult to draw any general trends in regards to which video method provides more reliable occurrence datasets. Nonetheless, we suggest predictions reflecting the species apparent distribution (i.e. a combination of species distribution and the probability of detecting it). Consequently, we also encourage researchers and marine managers to carefully interpret model predictions
    corecore